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Abstract. We consider the problem of controller synthesis under imper- 
fect information in a setting where there is a set of available observable 
predicates equipped with a cost function. The problem that we address 
is the computation of a subset of predicates sufficient for control and 
whose cost is minimal. Our solution avoids a full exploration of all possi- 
ble subsets of predicates and reuses some information between different 
iterations. We apply our approach to timed systems. We have developed 
a tool prototype and analyze the performance of our optimization algo- 
rithm on two case studies. 



1 Introduction 

Timed automata by Alur and Dill [2] is one of the most popular formalism for 
the modeling of real-time systems. One of the applications of Timed Automata 
is controller synthesis, i.e. the automatic synthesis of a controller strategy that 
forces a system to satisfy a given specification. For timed systems, the controller 
synthesis problem has been first solved in [TO] and progress on the algorithm 
obtained in [9] has made possible the application on examples of a practical 
interest. This algorithm has been implemented in the Uppaal-Tiga tool [3J, 
and applied to several case studies |1I11I12I2"T] . 

The algorithm of [9] assumes that the controller has perfect information about 
the evolution of the system during its execution. However, in practice, it is com- 
mon that the controller acquires information about the state of the system via 
a finite set of sensors each of them having only a finite precision. This motivates 
to study imperfect information games. 
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The first theoretical results on imperfect information games have been ob- 
tained in 23 , followed by algorithmic progresses and additional theoretical re- 
sults in [22], as well as application to timed games in |6l8j . This paper ex- 
tends the framework of |Hj and so we consider the notion of stuttering-invariant 
observation-based strategies where the controller makes choice of actions only 
when changes in its observation occur. The observations are defined by the val- 
ues of a finite set of observable state predicates. Observable predicates correspond, 
for example, to information that can be obtained through sensors by the con- 
troller. In [5], a symbolic algorithm for computing observation-based strategies 
for a fixed set of observable predicates is proposed, and this algorithm has been 
implemented in Uppaal-Tiga. 

In the current paper, we further develop the approach of [8 and consider a set 
of available observation predicates equipped with a cost function. Our objective 
is to synthesize a winning strategy that uses a subset of the available observable 
predicates with a minimal cost. Clearly, this can be useful in the design process 
when we need to select sensors to build a controller. 

Our algorithm works by iteratively picking different subsets of the set of 
the available observable predicates, solving the game for these sets of predicates 
and finally finding the controllable combination with the minimal cost. Our 
algorithm avoids the exploration of all possible combinations by taking into 
account the inclusion-set relations between different sets of observable predicates 
and monotonic properties of the underlying games. Additionally, for efficiency 
reasons, our algorithm reuses, when solving the game for a new set of observation 
predicates, information computed on previous sets whenever possible. 

Related works Several works in the literature consider the synthesis of con- 
trollers along with some notion of optimality |5l7l4ll3ll7l24ll4l20j but they con- 
sider the minimization of a cost along the execution of the system while our aim 
is to minimize a static property of the controller: the cost of observable predi- 
cates on which its winning strategy is built. The closest to our work is |14j where 
the authors consider the related but different problem of turning on and off sen- 
sors during the execution in order to minimize energy consumption. In [16] . the 
authors consider games with perfect information but the discovery of interesting 
predicates to establish controllability. In [15] this idea is extended to games with 
imperfect information. In those two works the set of predicates is not fixed a 
priori, there is no cost involved and the problems that they consider are unde- 
cidable. In J2U], a related technique is used: a hierarchy on different levels of 
abstraction is considered, which allows to use analysis done on coarser abstrac- 
tions to reduce the state space to be explored for more precise abstractions. 

Structure of the paper In section [2] wc define a notion of labeled transition 
systems that serves as the underlying formalism for defining the semantics of the 
two-player safety games. In the same section we define imperfect information 
games and show the reduction of [23] of these games to the games with complete 
information. Then in section[3]we define timed game automata, that we use as a 
modeling formalism. In section [4] we state the cost-optimal controller synthesis 
problem and show that a natural extension of this problem (that considers a 



simple infinite set of observation predicates) is undecidable. In section [5j we 
propose an algorithm and in section [51 we present two case studies. 

2 Games with Incomplete Information 

2.1 Labeled Transition Systems 

Definition 1 (Labeled Transition System). A Labeled Transition System 
(LTS) A is a tuple (S,Si n it,£,—t) where: 

— S is a (possibly infinite) set of states, 
~ Si n u G S is the initial state, 

— £ is the set of actions, 

— — ><Z S x U x S is a transition relation, we write si A- S2 if (si,a, S2) G— >■ 

W.l.o.g. we assume that a transition relation is total, i.e. for all states s G S 
and actions a G S, there exists s' G S such that s — > s' . 

A run of a LTS is a finite or infinite sequence of states r = (sq, si, . . . , s n , . . . ) 
such that Si s^+i for some action ai G S. r l denotes the prefix run of r 
ending at Sj. We denote by Runs(A) the set of all finite runs of the LTS A and 
by Runs" (A) the set of all infinite runs of the LTS A. 

A state predicate is a characteristic function ip : S — > {0, 1}. We write s \= <p 
iff ip(s) = 1. 

We use LTS as arenas for games: at each round of the game Player I (Con- 
troller) chooses an action a G S, and Player II (Environment) resolves the 
nondetcrminism by choosing a transition labeled with a. Starting from the state 
Sinit, the two players play for an infinite number of rounds, and this interaction 
produces an infinite run that we call the outcome of the game. The objective 
of Player I is to keep the game in states that satisfy a state predicate ip, this 
predicate typically models the safe states of the system. 

More formally, Player I plays according to a strategy A (of Player I) which is 
a mapping from the set of finite runs to the set of actions, i.e. A : Runs(A) — > E. 
We say that an infinite run r — (so, Si, S2, . . . , s n , . . . ) G Runs" (A) is consistent 

with the strategy A, if for all < i, there exists a transition Si — -> Si + \. We 
denote by Outcome(A, A) all the infinite runs in A that are consistent with A 
and start in Smit- An infinite run (so, s%, . . . , s n , . . . ) satisfies a state predicate 
tp if for all i > 0, Sj |= tp. A (perfect information) safety game between Player 
I and Player II is defined by a pair (A, ip) , where A is an LTS and <p is a state 
predicate that we call a safety state predicate. The safety game problem asks to 
determine, given a game (A, ip), if there exists a strategy A for Player I such that 
all the infinite runs in Outcome(A, A) satisfy ip. 

2.2 Observation-Based Stuttering-Invariant Strategies 

In the imperfect information setting, Player I observes the state of the game 
using a set of observable predicates obs = {(pi, <f2,---, i^m}- An observation is a 



valuation for the predicates in obs, i.e. in a state s, Player I is given the subset 
of observable predicates that are satisfied in that state. This is defined by the 
function Jobs' 

lobs(s) = W 6 obs | s \= <p} 

We extend the function j }, s to sets of states that satisfy the same set of 
observation predicates. So, if all the elements of some set of states v C S satisfy 
the same set of observable predicates o (i.e. Vs G v ■ jobs{s) = o), then we let 

lobs{v) = O. 

In a game with imperfect information, Player I has to play according to obser- 
vation based stuttering invariant strategies (OBSI strategies for short). Initially, 
and whenever the current observation of the system state changes, Player I pro- 
poses some action a £ S and this intuitively means that he wants to play the 
action a whenever this action is enabled in the system. Player I is not allowed 
to change his choice as long as the current observation remains the same. 

An Imperfect Information Safety Game (IISG) is defined by a triple (A, <p, obs). 

Consider a run r — (sq, s\, . . . , s n ), and its prefix r' that contains all the 
elements but the last one (i.e. r — r' ■ s n ). A stuttering-free projection r \. obs 
of a run r over a set of predicates obs is a sequence, defined by the following 
inductive rules: 

— if r is a singleton (i.e. n = 0), then r I obs — 7ob s (so) 

— else if n > and 7o6 s (s n -i) = Jobs(s n ), then r J, obs = r' 4- obs 

— else if n > and 7o6 S (s„_i) ^ 7ob s (s„), then r J. obs = r' J. obs ■ 7 6 S (s„) 

Definition 2. fSjj A strategy X is called obs-Observation Based Stuttering In- 
variant (obs-OBSI) if for any two runs r' and r" such that r' J. obs = r" 4- obs, 
the values of X on r' and r" coincide, i.e. A(r') = A(r"). 

We say that Player I wins in IISG (A, ip, obs), if there exists a o&s-OBSI 
strategy A for Player I such that all the infinite runs in Outcome(yl, A) satisfy (p. 

2.3 Knowledge Games 

The solution of a IISG (A, ip, obs) can be reduced to the solution of a perfect 
information safety game (G, ip), whose states are sets of states in A and represent 
the knowledge (beliefs) of Player I about the current possible states of A. 

We assume that <p 6 obs, i.e. the safety state predicate is observable for 
Player I. This is a reasonable assumption since Player I should be able to know 
whether he loses the game or not. 

Consider an LTS A = {S, Si n u, S, — >). We say that a transition s\ — > S2 in A 
is o&s-visible, if the states s\ and S2 have different observations (i.e. 7 & s (si) ^ 
lobs{s2)), otherwise we call this transition to be o&s-invisible. Let v C S be a 
knowledge (belief) of Player I in A, i.e. it is some set of states that satisfy the same 
observation. The set Post b s (v, a) contains all the states that are accessible from 
the states of v by a finite sequence of a-labeled o&s-invisible transitions followed 
by an a-labeled obs- visible transition. More formally, Post b s (v,a) contains all 



the states s' , such that there exists a run si -4 S2 A . . . A s„ and si G u, 
s n = s', ^fobs(si) = 7 o6s (s) for all 1 < i < n, and 7 0&s (s„) ^ J bs(s). 

The set Post t, s (v, a) contains all the states that are visible for Player I 
after he continuously offers to play action a from some state in v. Player I 
can distinguish the states si and S2 from Post t, s (v, a) iff they have different 
observations, i.e. 7 & s (si) ^ 7o&s(s2)- In other words, the set {Post i, s (v,a) n 
7^,(0) I o G V(obs)} \ {0} consists of all the beliefs that Player I might have 
after he plays the a action from the knowledge set 

A game can diverge in the current observation after playing some action. To 
capture this we define the boolean function Sink b s (v, a) whose value is true iff 
there exists an infinite run (sq, si, . . . , s n , . . . ) G Runs(A) such that sq G v and 
for each i > we have Si s^+i and 7o(, s (s.j) = 7o&s(so)- 

Definition 3. We say, that a game (G, ip) is the knowledge game for (A, ip, obs), 
if G = (V, Vi n it, E, — > s ) is an LTS and 

— V = {v e V(S) I Vsi, S2 G v ■ 7obs(si) = 7o6s(s2)} \ {0} is the set of all the 
beliefs of Player I in A, 

— Vinit — {sinit} is the initial game state, 

— — > g represents the game transition relation; a transition v\ ^t g t>2 exists iff: 

• V2 = Post t, s (vi,a) n lobsi ) an d v 2 7^ f or some o C obs, or 

• Sinkobs (vi , a) is true and V2 = v\. 

— v \= ip iff (p G jobs(v)- 

Theorem 1 (|8j). Player I wins in a IISG (A,tp,obs) iff he has a winning 
strategy in the safety game (G,ip) which is the knowledge game for (A,ip,obs). 

This theorem gives us the algorithm of solution of a IISG for the case when 
the knowledge games for it is finite and can be automatically constructed. 

3 Timed Game Automata 

The knowledge game (G, t/j) for (A, (p, obs) is finite when the source game A is 
finite 23 . The converse is not true and there are higher level formalisms that can 
induce infinite games for which knowledge games are still finite and can be au- 
tomatically constructed. One of such formalisms is Timed Game Automata [18] , 
that we use as a modeling formalism and that has been proved in [8] to have 
finite state knowledge games. 

Let X be a finite set of real- valued variables called clocks. We denote by C(X) 
the set of constraints ip generated by the grammar: ip ::= x ~ k \ x — y ~ k \ ipAip 
where k G IN, x, y G X and ~G {<, <, =, >, >}. B{X) is the set of constraints 
generated by the following grammar: ip ::= T\ki<x<k2\ip/\ip where 
k,ki,k 2 G IN, ki < fc 2 , x G X, and T is the boolean constant true. 

A valuation of the clocks in X is a mapping X H> R>o- For Y C X, we denote 
by v[Y] the valuation assigning (respectively, v(x)) for any i£F (respectively, 



5 the powerset V(S) is equal to the set of all subsets of S 



x £ X \ Y). We also use the notation for the valuation that assigns to each 
clock from X. 

Definition 4 (Timed Game Automata). A Timed Game Automaton (TGA ) 
is a tuple (i, X, E, S c , E u , I) where: 

— L is a finite set of locations, 

~ Unit £ L is the initial location, 

— X is a finite set of real-valued clocks, 

— S c and S u are finite the sets of controllable and uncontrollable actions (of 
Player I and Player II, correspondingly) , 

— E C(Lx B(X) x E c x 2 X x L) U (L x C(X) x S u x 2 X x L) is partitioned 
into controllable and uncontrollable transition^, 

— I : L — » B(X) associates to each location its invariant. 

We first briefly recall the non-game semantics of TGA, that is the semantics 
of Timed Automata (TA) [2J. A state of TA (and TGA) is a pair (l,v) of a 
location I £ L and a valuation v over the clocks in X. An automaton can do two 
types of transitions, that are defined by the relation ^->: 

— a delay (l,v) f° r some t £ M>o, v' = v + t and v' |= 1(1), i.e. to 
stay in the same location while the invariant of this location is satisfied, and 
during this delay all the clocks grow with the same rate, and 

a discrete transition (l,v) ^ (/',«') if there is an element (I, g, a, Y, I') £ E, 
v |= g and v' — v[Y], i.e. to go to another location I' with resetting the clocks 
from Y, if the guard g and the invariant of the target location V arc satisfied. 

In the remainder of this section, we define the game semantics of TGA. As 
in jS], for TGA, we let observable predicates be of the form (K, ip), where K C L 
and tp £ B(X). We say that a state (/, v) satisfies (K, ip) iff I £ K and v \= ip. 

Intuitively, whenever the current observation of the system state changes, 
Player I proposes a controllable action a £ S c and as long as the observation 
does not change Player II has to play this action when it is enabled, and otherwise 
he can play any uncontrollable actions or do time delay. Player I can also propose 
a special action skip, that means that he lets Player II play any uncontrollable 
actions and do time delay. Any time delay should be stopped as soon as the 
current observation is changed, thus giving a possibility for Player I to choose 
another action to play. 

Formally, the semantics of TGA is defined by the following definition: 

Definition 5. The semantics of TGA (L, Zj n j{, X, E, S c , S u , I) with the set of 
observable predicates obs is defined as the LTS (S, Sinn, S c U {skip}, — y), where 

5 = L x K>o? s imt = (linit,0) and the transition relation is: denotes the 
non-game semantics of M ) 

6 We follow the definition of [8] that also assumes that the guards of the controllable 
transitions should be of the form ki < x < ki. This allows us to use the results from 
that paper. In particular, we use urgent semantics for the controllable transitions, 
i.e. for any controllable transition there is an exact moment in time when it becomes 
enabled. 



— s lp > s' exists, iff s ^-^V s' for some a u G S u> or there exists a delay 
s <^-> s' for some t G R>o and any smaller delay doesn't change the current 

observation (i.e. if s <— > s" and < t' < t then j bs(s) — jobsis")). 

— for a G S C) s s' exists, iff: 

• a is enabled in s and there exists a discrete transition s <A s' , or 

• a is not enabled in s, but there exists a discrete transition s °- ^ s' for 
some a u G E u , or 

• there exists a delay s °-> s' for some t G R>o., and for any smaller 

delay s c — > s" (where < t' < t) the observation is not changed, i.e. 
"fobs(s) = ^obsis"), and action a is not enabled in s" . 

For a given TGA M, set of observable predicates obs and a safety state- 
predicate ip (that can be again of the form (K, t/j)), we say that Player I wins in 
the Imperfect Information Safety Timed Game (IISTG) (M, ip, obs) iff he wins 
in the IISG (A, ip, obs), where A defines the semantics for M and obs. 

The problem of solution of IISTG is decidable since the knowledge games 
are finite for TGA 8 . The paper [8] proposes a symbolic Difference Bounded 
Matrices (DBM)-based procedure to construct them. 

4 Problem Statement 

Consider that several observable predicates are available, with assigned costs, 
and we look for a set of observable predicates allowing controllability and whose 
cost is minimal. This is formalized in the next definition: 

Definition 6. Consider a TGA M , a finite set of available observable pred- 
icates Obs over M , a safety observable predicate ip G Obs and a monotonic 
with respect to set inclusion function uj : V{Obs) — > R>o- The optimization 
problem for (M, tp, Obs, uj) consists in computing a set of observable predicates 
obs C Obs such that Player I wins in the Imperfect Information Safety Timed 
Game (M,ip,obs) and uj (obs) is minimal. 

We present in the next section our algorithm to compute a solution to the 
optimization problem. In this paper, we restrict our attention to finite sets of 
available predicates. We justify this restriction by the following undecidability 
result: considering a reasonable infinite set of observation predicates, the easier 
problem of the existence of a set of predicates allowing controllability is unde- 
cidable (the proof is given in Appendix : 

Theorem 2. Consider a TGA M with clocks X, and an (infinite) set of avail- 
able predicates Obs = {x < | | x G X, q G N, q > 1} and the safety objective (p. 
Determining whether there exists a finite set of predicates obs G Obs such that 
Player I wins in IISTG (M, ip, obs) is undecidable. 



Algorithm 1 Lattice-based algorithm 

//input: TGA M, a set of observable predicates Obs, a safety predicate ip 
//output: a solution with a minimal cost 
function Optimize(M, ip, Obs, ui): 

1. candidates := V(Obs) / / initially, candidates contains all subsets of Obs 

2. bestjcandidate := None 

3. while candidates ^ 0: 

4. pick obs £ candidates 

5. if Solve(M,tp,obs): 

6. best ..candidate := obs 

7. candidates — candidates \ {c : c 6 V(Obs) A o>(c) > co(obs)} 

8. else: 

9. candidates — candidates \ {c : c G V(Obs) A c C ofos} 

10. return best_candidate 



5 The Algorithm 

The naive algorithm is to iterate through all the possible solutions V(Obs), for 
each o6s G V(Obs) solve IISTG (M,ip,obs) via the reduction to the finite-state 
knowledge games, and finally pick a solution with the minimal cost. 

In section 15.11 we propose the more efficient algorithm that avoids exploring 
all the possible solutions from V(Obs). Additionally, in sections 15.21 we describe 
the optimization that reuses the information between different iterations. 

5.1 Basic Exploration Algorithm 

Consider, that we already solved the game for the observable predicates sets 
obsi,obs2, ■ • ■ , obs n and obtained the results r%, T2, ■ ■ • , r n , where is either true 
or false, depending on whether Player I wins in IISTG (M, ip, obsi) or not. 

From now on we don't have to consider any set of observable predicates 
with a cost larger or equal to the cost of the optimal solution found so far. 
Additionally, if we know, that Player I loses for the set of observable predicates 
obsi (i.e. r, = false), then we can conclude that he also loses for any coarser 
set of observable predicates obs C obsi (since in this case Player I has less 
observation power). Therefore we don't have to consider such obs as a solution 
to our optimization problem. This can be formalized by the following definition: 

Definition 7. A sequence (obsi, r\), {obs2, r-£) . . . (obs n , r n ) is called a non-redundant 
sequence of solutions for a set of available observable predicates Obs and cost 
function uj, if for any 1 < i < n we have obsi C Obs, ri s {true, false}, and for 
any j < i we have: 

— Lj{obsj) > Lj{obsi) if rj = true, 

— obsi % obsj, otherwise. 




Fig. 1: a) The original LTS A and two observable predicates ipi and (p2, 

b) the knowledge game Gf for A with observable predicates {(pi,tf2}, 

c) the knowledge game G\ for A with observable predicates {(pi}, 

d) the knowledge game G\ for Gf with observable predicates {(pi} 

Algorithm [1] solves the optimization problem by iteratively solving the game 
for different sets of observable predicates so that the resulting sequence of so- 
lutions is non-redundant. The procedure Solve{M, tp, obs) uses the knowledge 
game-reduction technique described in section [5] The algorithm updates the set 
candidates after each iteration and when the algorithm finishes, the bestjzandidate 
variable contains a reference to the solution with the minimal cost. 

Algorithm [T] doesn't state, in which order we should navigate through the set 
of candidates. We propose the following heuristics: 

— cheap first (and expensive first) — pick any element from the candidates 
with the maximal (or minimal) cost, 

— random — pick a random element from the candidates, 

— midpoint — pick any element, that will allow us to eliminate as many ele- 
ments from the candidates set as it is possible. In other words, we pick an 
element that maximizes the value of 

min(|{c : c £ candidates A w{c) > w(obs)}\, \{c : c € candidates Ac C obs}\). 

Algorithm[T]doesn't specify how we store the set of possible solutions candidates. 
An explicit way (i.e. store all elements) is expensive, because the candidates set 
initially contains 2^° bs ^ elements. However, an efficient procedure for obtaining 
a next candidate may not exist as a consequence of the following theorem that 
is proved in the Appendix [Bj 

Theorem 3. Let seq n = (obsi, ri), {obs2, T2), ■ ■ ■ , (obs n , r n ) be a non-redundant 
sequence of solutions for some set Obs and cost function uj : V{Obs) — > R>o- 
Consider that the value of uj can be computed in polynomial time. Then the 
problem of determining whether there exists a one-element extension 
seq n+ i = (obsi,ri),(obs2,r 2 ), . . . ,(obs n ,r n ),(obs n+ i,r n+ i) of seq that is still 
non-redundant for Obs and uj is NP-complete. 

5.2 State space reusage from finer observations 

Intuitively, if we have already solved a knowledge game (Gf,ipt) for a set obsf of 
observable predicates, then we can view a knowledge game (G c ,ip c ) associated 



with a coarser set of observable predicates obs c C obsf as an imperfect infor- 
mation game with respect to (G/,"0/). Thus we can solve the knowledge game 
for obs without exploring the state space of the TGA M and therefore without 
using the expensive DBM operations. Moreover, we can build another game on 
top of G c (for an observable predicates set that is coarser than obs) and thus 
construct a "Russian nesting doll" of games. This is an important contribution 
of our paper, since this construction can be applied not only to Timed Games, 
but also to any modeling formalism that have finite knowledge games. 

The state space reusage method is demonstrated on a simple LTS A at Fig.[TJ 
Suppose, that we already built the knowledge game G/ for the observable pred- 
icates {tp±, ¥2}- Now, if we want to build a knowledge game for we can do 
that in two ways. First, we can build it from scratch based on the state space of 
A, and the resulting knowledge game G\ is given at subfigure c. Alternatively, 
we can build the knowledge game G\ on the top of Gf (see subfigure d). The 
states of G\ are sets of states of A and the states of G"j. are sets of sets of states 
of A. The games G\ and G\ are bisimilar, thus Player I wins in G\ iff he wins 
in G\ (for any safety predicate). The latter is true for any LTS A, that is stated 
by the following theorem and corollary (that are proved in the appendix [Cj : 

Theorem 4. Suppose that obs c C obsj, (Gf,?pf) is the knowledge game for 
(A,ip,obsf), (G^,ipl) is the knowledge game for (A,ip,obs c ) and {G^,,ip^) is the 
knowledge game for (Gf, ipf, obs c ). Then the relation R = {(v, v')\v = Ua'ev' s 'j" 
between the states of G\ and G\ is a bisimulation. 

Corollary 1. Player I wins in (Gl,tpl) iff Player I wins in (G^, tpc)- 

This reusage method is also correct for the case when an input model is 
defined as a TGA (since we can apply the theorem to the underlying LTS). 

Implementation Our Python prototype implementation of this algorithm (see 
https://launchpad.net/pytigaminobs) explicitly stores the set of candidates 
and uses the on-the-fly DBM-based algorithm of [5] for the construction and 
solution of knowledge games for IISTG (the algorithm stops early when it detects 
that the initial state is losing). 

6 Case studies 

We applied our implementation to two case studies. 

The first is a "Train- Gate Control" , where two trains tracks merge together 
on a bridge and the goal of the controller is to prevent their collision. The trains 
can arrive in any order (or don't arrive at all), thus the challenge for the controller 
is to handle all possible cases. 

The second is "Light and Heavy boxes" , where a box is being processed on 
the conveyor in several steps, and the goal of the controller is to move the box to 
the next step within some time bound after it has been processed at the current 
step. 
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Fig. 2: A model of a single train 



6.1 Train- Gate control 

The model of a single (first) train is depicted at Fig. [2] There are two semaphore 
lights before the bridge on each track. A train passes the distance between 
semaphores within 1 to 2 time units. A controller can switch the semaphores 
to red (actions stopl and stop2 depending on the track number), and to green 
(actions gol and go2). These semaphores are intended to prevent the trains from 
colliding on the bridge. When the red signal is illuminated, a train will stop at 
the next semaphore and wait for the green signal. 

It is possible to mount sensors on the semaphores, and these sensors will 
detect if a train approaches the semaphore. This is modeled with observable 
predicates (posl > 1), (pos2 > 1), (posl > 2) and (pos2 > 2). 
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(a) Running time (the average is computer on 10 runs) 
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(b) The average number of iterations 



Fig. 3: Results for the Train-Gate model 



The controller has a discrete timer that is modeled using the clock y. At any 
time this clock can be reset by the controller (action reset). There is an available 
observable predicate (y < 2) that becomes false when the value of y reaches 2. 
This allows the controller to measure time with a precision 2 by resetting y each 
time this predicate becomes false and counting the number of such resets. 

The integer variable critical contains the number of trains that are currently 
on the bridge. The safety property is that no more than one train can be at the 



critical section (bridge) at the same time and the trains should not be stopped 
for more than 2 time units: 

(critical < 2)A((Traml. STOPPED) -> (xl < 2))A((Train2. STOPPED) -> (x2 < 2)) 

The optimal controller uses the following set of observable predicates: (post > 
2), (pos2 > 2) and < 2). Such a controller waits until the second (in time) 
train comes to the second semaphore, then pauses this train and lets it go after 
2 time units. 

Figure [3a] reports the time needed to find this solution for different param- 
eters of the algorithm. Figure [3b] contains the average number of iterations of 
Algorithm [1] (i.e. game checks for different sets of observable predicates). You 
can see that it requires only a fraction of the total number of all possible solu- 
tions 2 5 = 32. Additionally, the state space reusage heuristic allows to improve 
the performance, especially for the "expensive first" exploration order. For this 
model the most efficient way to solve the optimization problem is to first solve 
the game with all the available predicates being observed, and then always reuse 
the state space of this knowledge game. The numbers of and 1 at Figure [3b] 
reflect that we don't reuse the state space exactly once for the "expensive first" 
order, and we never reuse the state space for the "cheap first" exploration order. 

The game size ranges from 5 states for the game when only the safety state 
predicate is observable to 9202 for the case when all the available predicates are 
observable. The number of the symbolic states of TGA (i.e. different pairs of 
reachable locations and DBMs that form the states of a knowledge game) ranges 
from 1297 to 31171, correspondingly. 

6.2 Light and Heavy Boxes 



Consider a conveyor belt on which Light and Heavy boxes can be put. A 
box is processed in n steps (n is a parameter of the model), and the processing 
at each step takes from 1 to 2 time units for the Light boxes, and from 4 to 




Fig. 4: Light and heavy boxes model 
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Fig. 5: Average running time (SSR states for State Space Reusage) 



5 time units for the Heavy boxes. The goal of the controller is to move a box 
to the next step (by rotating the conveyor, with an action move) within 3 time 
units after the box has been processed at the current step. At the last step the 
controller should remove (action remove) the box from the conveyor within 3 
time units. If the controller rotates the conveyor too early (before the box has 
been processed), too late (after more than 3 time units), or does not move it 
at all, then the Controller loses (similar is true for the removing of the box at 
the last step). Additionally, the controller should not rotate the conveyor when 
there is no box on it, and should not try to remove the box when the box is not 
at the last step. Our model is depicted at Fig. 01 and the goal of the controller 
is to avoid the BAD location. 

A box can arrive on the conveyor at any time, and there is an observable 
predicate (pos — 0) with cost 1 which becomes true when the box is put on 
the conveyor. Additionally, there is predicate (heavy = true) with cost 1 that 
becomes true if a heavy box arrives. The model is cyclic, i.e. another box can be 
put on the conveyor after the previous box has been removed from it. 

As in the Traingate model, the controller can measure time using a special 
clock y. We assume that a controller can measure time with different granularity, 
and more precise clocks cost more. We model this by having three available 
observable predicates: (y < 1) with cost 3, (y < 2) with cost 2, and (y < 3) with 
cost 1. 

A naive controller works with the observable predicates {(heavy = true), (pos = 
0), (y < 1)}, resets the clock y each time a new box is arrived, and then move 
it to the next step (remove after the last iteration) each 2 time units if the 
box is light and 5 time units if the box is heavy. However, it is not neces- 
sary to use the expensive (y < 1) observable predicate, since a controller can 
move a box after each 3 (6 for heavy box) time units, thus the time granular- 
ity of 3 is enough and there is a controller that uses the observable predicates 
{(heavy = true), (pos = 0),(y < 3)}. Our implementation detects such an opti- 
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mal solution, and Fig. [5] demonstrates an average time needed to compute this 
solution for different numbers of box processing steps n. You can see that the 
state space reusage heuristics improves the performance of the algorithm. 

The game size for this model ranges from 4 knowledge game states and 51 
symbolic NTA states when there are 2 processing steps and only safety predicate 
is observable to 6417 knowledge game states and 15554 symbolic NTA states for 
9 processing steps and when all the available predicate are observable. 

7 Conclusions 

In this paper we have developed, implemented and evaluated an algorithm for the 
cost-optimal controller synthesis for timed systems, where the cost of a controller 
is defined by its observation power. 

Our important contributions are two optimizations: the one that helps to 
avoid exploration of all possible solutions and the one that allows to reuse the 
state space and solve the imperfect information games on top of each other. Our 
experiments showed that these optimizations allow to improve the performance 
of the algorithm. 

In the future, we plan to apply our method to other modeling formalisms 
that have finite state knowledge games. 
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A Proof of the theorem [2] 

Theorem 2. Consider a TGA M with clocks X , and an (infinite) set of avail- 
able predicates Obs = {x<^\x^X,qEN,q>l} and the safety objective ip. 
Determining whether there exists a finite set of predicates obs C Obs such that 
Player I wins in IISTG (M, ip, obs) is undecidable. 

Proof. The proof below adopts the construction of the proof of the undecidability 
of the existence of a sampling rate allowing controllability of a timed automata 
w.r.t. a safety objective, proved in [TU] . 

The proof is by reduction to boundedness of 2-counter machines. It is based 
on the encoding of such a machine into a timed automaton M described in [TU] . 
We do not recall this construction here. Formally, we write M — (L, li n u,X, E, S c , E, 
and recall that the construction involves a location over. The main property of 
this construction used in this proof is the following: given a rational number 
| G Q>o and denoting k = [b\, the counters of M never exceed value k if and 
only if location over is not reachable in the semantics of A sampled by In 
addition, we slightly modify the construction as follows: we add a fresh clock 
z which, along every transition, is systematically reset and checked to be posi- 
tive. Note that this has no effect on the sampled semantics of the automaton, 
whatever the value of the sampling. We also consider that every transition are 
controllable. 

We consider as the safety objective the set (L \ {over}, K> ). The set of ob- 
servable predicates is defined as Obs\ (+J Obs2, where Obs\ is the set of predicates 
(£, R> ), for every location <6L, and O&S2 is the set of predicates (L,z < |), 
where geN*. We will show that there exists a finite set of predicates for which 
the system is controllable if and only if the 2-counter machine is bounded. 

Assume the machine is bounded, say by value k. Thanks to the property 
of M recalled above, the semantics of M for the sampling rate i never enters 
location over, and thus verifies the safety objective. We prove that the system is 
controllable for the (finite) set of predicates Obs\ U {(L, z < jr)}- The strategy 
of the controller is as follows: it alternates between delays (action skip) and 
discrete actions. As clock z is reset on every transition, this allows to simulate 
a sampled behavior, for sampling rate jr. Indeed, after each discrete step, the 
value of clock z is zero, and thus the predicate (L,z < jr) will become false 
exactly after i time units. At this time, the controller proposes an action a 
which is enabled. The outcome of this strategy is a run of M under the sampled 
semantics, for the sampling rate jr. In particular, this run never enters location 
over. 

Conversely, we proceed by contradiction and assume that the machine is not 
bounded and that there exists a finite subset obs of Obs\ |+J Obs2 which allows 



to control the system. Let us denote by k the least common multiple of integers 
q such that the predicate (L, z < ~) belongs to obs. Since we have modified 
the machine by requesting some positive delay to elapse between two discrete 
actions, and by the semantics of stuttering invariant strategies, we know that 
all the actions of the controller will be played on ticks of sampling ^ (but not 
necessarily all of them, the controller could propose to wait, or could use a 
predicate which is a multiple of i). As a consequence, the controlled behavior 
of M will be a subset of the sampling semantics of M w.r.t. sampling unit i- 
This constitutes a contradiction since, as the machine is not bounded and by 
properties of the construction of M , any sampled behavior of M will eventually 
either be blocked or reach location over. □ 

B Proof of the theorem [3] 

Theorem 3. Let seq n = (obsi,r\), (o6s2,r2), . . . , (obs ni r n ) be a non-redundant 
sequence of solutions for some set Obs and cost function uj : V(Obs) — > K>o- 
Consider that the value of uj can be computed in polynomial time. Then the 
problem of determining whether there exists a one-element extension 
seq n+ i = (obsi,ri), (obs 2 ,r 2 ), ■ . ■ , (obs n ,r n ), (obs n+ i, r n+1 ) of seq that is still 
non-redundant for Obs and uj is NP-complete. 

Proof. First, we show that the problem is in NP. Indeed, a proof certificate is a 
sequence seq n +i itself, and its fitness can be checked in polynomial time. 

Next, we demonstrate the NP-hardness by showing the reduction from the 
vertex cover problem. Formally, a vertex cover of a graph G = (V, E) is a set 
C C V of vertices such that each edge of G is incident to at least one vertex 
in C . The vertex cover decision problem is to determine for a given G and k, 
whether there exists a vertex cover C of the graph G, and the size of the set C 
should be at most k. This problem is known to be NP-complctc. 

Consider, that we are given a graph (V, E) and a constant k, and we want to 
check if there exists a vertex cover of size at most k. Consider also that \V\ = m, 
\E\ = n, and E — {e\, e%, ■ ■ ■ , e n }. Let's choose the set Obs to be equal to 
V U {o c }, where o c V is a special element. Let's define the value of the cost 
function u>(obs) to be equal to \obs\ + k if o c S obs and to be equal to \obs\, 
otherwise. Consider a set O = {obsi, obs2, ■ ■ ■ , obs n } of subsets of Obs, where 
\0\ = \E\, and obsi contains all the vertices from V that are not incident to the 
edge ei. Let's order the elements from O in a sequence (obs ai ,obs a2 i ■ ■ ■ ,obs an ) 
such that \obs ai \ < \obs aj \ for any i < j. All the sets of this sequence are pairwise 
different (i.e. obs ai ^ obs aj for i ^ j), and thus obs ai % obs aj for any i < j. 

Let us consider a sequence 

seq n = (obs 0l , false), (obs a2 , false) . . . , (obs 0n , false), ({o c },true) 

First, it can be easily seen that this sequence is non-redundant for the chosen 
Obs and w (since obs ai % obs a . for any i < j). 



And second, the sequence seq n can be extended by one element such that the 
resulting sequence is still non-redundant iff there exists a vertex cover of size at 
most k for the graph (V, E). Indeed, such an extension exists iff there exists a set 
obs C Obs such, that u{obs) < uj({o c }), and obs <£. obsi for all i = l..n. Note that 
uj is monotonic and therefore u(obs) < w({o c }) implies that o c obs, and thus 
ui(obs) = \obs\. Therefore, we should have \obs\ < k + 1 (since uj({o c }) = k + 1) 
and for any i there should be some vertex v that belongs to obs and doesn't 
belong to obsi. The latter is equivalent to the fact that there exists a set of 
vertices obs of size at most k and any edge from E is incident to at least one 
vertex in obs. This proves the NP-hardness. □ 

C Proof of the theorem |4] 

Theorem 4. Suppose that obs c C obsf, (G/,-0/) is the knowledge game for 
(A, if , obs /), (Gl,tpl) is the knowledge game for (A,if,obs c ) and (G^ipc) is ^ e 
knowledge game for (G/, ipf, obs c ). Then the relation R = {(v, v')\v = [J s , ev i s'} 
between the states of G\ and G\ is a bisimulation. 

Proof. Suppose, that G\ = (V, v init ,S, -> g ) and G\ = (V, v' init , U, ->' g ). 

First, we have v init = {s init } and v' init = {{s mU }} and thus (v init , v' init ) G R. 
Consider an action a G S and a pair of bisimilar states v prec i G V and 

V pred G V (i- e - ( v pred, v' pred ) 6 R ) ■ lt ' S eaS Y to See > tliat 7obs ( U pred) = lobs{v p red) ■ 

Now we will demonstrate, that for any a-successor of v pre d there is a bisimular 
a-successor of v' pred and vice versa. More precisely, we will show that for any 
game states v succ G V and v' succ G V, if 7obs(v S t l cc) = 7obs(Vst lC c) an d there are 
transitions v pred A g w sticc and w^ red A s t/„ cc , then w succ = l)v>ev succ v ' and 
thus (v sticc , v' succ ) G i?. 

Suppose, that y ba(vsucc) and 7obs( u Lcc) are equal to obs succ , and 7 6 S (u P red) 
and lobs{v' pred ) are equal to obs pred . 

First, it is easy to see that an LTS state s belongs to the game state v succ 
iff there exists a sequence of transitions s\ — ¥ S2 — > . . . — > s n — > g s, where 
si G v pred , lobs(s) = obs succ and j b s (si) = obs pred for any i < N. We call such 
a sequence a proof sequence for s in Gj. 

Again, it's easy to see, that an LTS state s' belongs to the set [j v r ev i v' iff 

there exists a sequence of game transitions v[ A g . . . ^ g v' n v" such that 
v 'i e v ' P red-> s ' e v '' ') 7obs(w") = obs succ and for any i < N we have 7 & s (u,:) = 
obs prei i . The latter is true iff there exists a sequence of transitions 
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such that s\ G U-u'eu d v ' and f° r an Y h 3 we have 7of>s(Si) = obs pre d and 
7o&s'(si) 7^ lobs' {s} +1 ). We call such a sequence a proof sequence for s' in G^. 



Now it's easy to sec, that the sets v succ and \J v , ev , v' coincide, since any 
proof sequence for s in G\ is a proof sequence for s in G\ . At the same time, any 
proof sequence s\ —> S2 A- • • • — > s„ s for s in G\ is a proof sequence for s 
in G;?, if we compute the values of a sequentially based on the changes of the 
-fobs' function. More formally, we choose Ck to be the largest integer such that 
lob s '{s m +i) = lobs' (s m +j) for to = Y.i<k c i and an y * < c fc' 3 < c fc- 

This shows that {v SU cc, v ' SU cc) S R and proves that R is a bisimulation. □ 

Corollary 1. Player I wins in (Gl,ipl) "iff Player I wins in (G;?, ipe) 

Proof. According to the theoremUthere is a bisimulation relation R = {(v, v')\v = 
Us'gu' s '} between G\ and Gi- 
lt's easy to see that the bisimulation relation preserves the satisfiability of 
tpl and tpc formulas, i.e. for any pair of states (v,v') G R, v \= ip\ iff v' \= ip^- 
Therefore Player I wins in (G^, V>c) iff Player I wins in (Gj, ipl). 

a 



